Optimizing Stencil Computations for NVIDIA Kepler GPUs

نویسندگان

Naoya Maruyama

Takayuki Aoki

چکیده

We present a series of optimization techniques for stencil computations on NVIDIA Kepler GPUs. Stencil computations with regular grids had been ported to the older generations of NVIDIA GPUs with significant performance improvements thanks to the higher memory bandwidth than conventional CPU-only systems. However, because of the architectural changes introduced with the latest generation of the GPU architecture, Kepler, we show that existing implementation strategies used for such older GPUs are not as effective on Kepler as before. To fully exploit the potential performance of the latest generation of the GPU architecture, our implementation method uses shared memory for better data locality combined with warp specialization for higher instruction throughput. Our method achieves approximately 80% of the estimated peak performance by the roofline model, and even higher performance with temporal blocking.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, which greatly limits their performance and power efficiency. In this paper, we accelerate the forward-modeling technique on the latest multi-co...

متن کامل

Performance of Kepler GTX Titan GPUs and Xeon Phi System

NVIDIA’s new architecture, Kepler improves GPU’s performance significantly with the new streaming multiprocessor SMX. Along with the performance, NVIDIA has also introduced many new technologies such as direct parallelism, hyper-Q and GPU Direct with RDMA. Apart from other usual GPUs, NVIDIA also released another Kepler ‘GeForce’ GPU named GTX Titan. GeForce GTX Titan is not only good for gamin...

متن کامل

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

The Lattice Boltzmann method (LBM) for solving fluid flow is naturally well suited to an efficient implementation for massively parallel computing, due to the prevalence of local operations in the algorithm. This paper presents and analyses the performance of a 3D lattice Boltzmann solver, optimized for third generation nVidia GPU hardware, also known as ‘Kepler’. We provide a review of previou...

متن کامل

Early Experiences in Running Many-Task Computing Workloads on GPGPUs

This work aims to enable Swift to efficiently use accelerators (such as NVIDIA GPUs) to further accelerate a wide range of applications. This work presents preliminary results in the costs associated with managing and launching concurrent kernels on NVIDIA Kepler GPUs. We expect our results to be applicable to several XSEDE resources, such as Forge, Keeneland, and Lonestar, where currently Swif...

متن کامل

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the Fermion Matrix. This part is therefore frequently optimized for various HPC architectures. Here we compare the performance of the Intel R Xeon Phi TM to current Kepler-based NVIDIA R Tesla TM GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Optimizing Stencil Computations for NVIDIA Kepler GPUs

نویسندگان

چکیده

منابع مشابه

Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil

Performance of Kepler GTX Titan GPUs and Xeon Phi System

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

Early Experiences in Running Many-Task Computing Workloads on GPGPUs

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

عنوان ژورنال:

اشتراک گذاری